Ranking Retrieval Systems with Partial Relevance Judgements

نویسندگان

  • Shengli Wu
  • Fabio Crestani
چکیده

Some measures such as mean average precision and recall level precision are considered as good system-oriented measures, because they concern both precision and recall that are two important aspects for effectiveness evaluation of information retrieval systems. However, such good system-oriented measures suffer from some shortcomings when partial relevance judgments are used. In this paper, we discuss how to rank retrieval systems in the condition of partial relevance judgments, which is common in major retrieval evaluation events such as TREC conferences and NTCIR workshops. Four system-oriented measures, which are mean average precision, recall level precision, normalized discount cumulative gain, and normalized average precision over all documents, are discussed. Our investigation shows that averaging values over a set of queries may not be the most reliable approach to rank a group of retrieval systems. Some alternatives such as Borda count, Condorcet voting, and the Zero-one normalization method, are investigated. Experimental results are also presented for the evaluation of these methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Query Characteristics on Retrieval Results in the Trec Retrieval Tests

There have been three Text Retrieval Conferences (TREC) organized by the National Institute of Standards and Technology (NIST) over the last three years which have compared retrieval results on fairly large databases (at least 1 gigabyte). The queries (called topics), relevance judgements and databases were all provided by NIST. The main goal of the tests was to compare various retrieval algori...

متن کامل

Variation of Relevance Assessments for Medical Image Retrieval

Evaluation is crucial for the success of most research domains, and image retrieval is no exception to this. Recently, several benchmarks have been developed for visual information retrieval such as TRECVID, ImageCLEF, and ImagEval to create frameworks for evaluating image retrieval research. An important part of evaluation is the creation of a ground truth or gold standard to evaluate systems ...

متن کامل

Using Structural Relationships for Focused XML Retrieval

In focused XML retrieval, information retrieval systems have to find out which are the most appropriate retrieval units and return only these to the user, avoiding overlapping elements in the result lists. This paper studies structural relationships between elements and explains how they can be used to produce a better ranking for a focused task. We analise relevance judgements to find the most...

متن کامل

On Aggregating Labels from Multiple Crowd Workers to Infer Relevance of Documents

We consider the problem of acquiring relevance judgements for information retrieval (IR) test collections through crowdsourcing when no true relevance labels are available. We collect multiple, possibly noisy relevance labels per document from workers of unknown labelling accuracy. We use these labels to infer the document relevance based on two methods. The first method is the commonly used ma...

متن کامل

Relevance Judgements for Assessing Recall

| Recall and Precision have become the principle measures of the e ectiveness of information retrieval systems. Inherent in these measures of performance is the idea of a relevant document. Although recall and precision are easily and unambiguously de ned, selecting the documents relevant to a query has long been recognised as problematic. To compare performance of di erent systems, standard co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. UCS

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2008